


Journal of Biological Engineering Research and Review, 2021; 8(2): 280-285 
ISSN: 2349-3232 Online 
Available online at www.biologicalengineering.in/Archive 


Review Article 


Web Data Mining: Sentiment Analysis of Amazon Product Reviews 


Biswajit Biswas, Manas Kumar Sanyal, Tuhin Mukherjee 


Department of Business Administration, University of Kalyani, West Bengal, India 


*Corresponding author. E-mail: biswajit.biswas0012@gmail.com (B. Biswas) 








ARTICLE INFO: 
Article History: 
Received: 30/06/2021 
Revised: 28/10/2021 
Accepted: 11/12/2021 
Published: 31/12/2021 
Keywords: 
Web-scraping; Data Mining; 
Feedback analysis; Review 
mining. 


Copyright: © 2020 Biswas et 
al. This is an open-access article 
distributed under the terms of 
the Creative Commons 
Attribution License (CC BY 4.0). 


Citation: Biswas et al, Web 
Data Mining: Sentiment Analysis 
of Amazon Product Reviews J 
BiolEngg Res & Rev, Vol. 8, 


Abstract: Sentiment analysis on customer feedback in e-business is a rapid growing research 
area in the Business Intelligence System (BIS). A huge number of reviews numeric ratings and 
textual reviews (quantitative and qualitative) are shown on the e-market platforms on every 
product. These reviews are very helpful for the new buyers, sellers, retailers and 
manufacturers also. But it is very difficult or impossible to analysis this reviews data 
manually for business decision. Author proposed and developed a customer feedback or 
reviews analysis system that can extract the real time online review data from the e-marketing 
website and then analysis the sentiment of the extract reviews. Marketers and Customers can 
utilize review mining and feedback analysis, which have influenced the neighboring world by 
moving their belief on a particular product. Data manipulate in this research work are real- 
time product reviews (Galaxy S20 5G) collected from e-marketing site (amazon.in) through 
the web-scraping by python programming. The Author implemented a relative sentiment 
analysis of retrieved online reviews. This work provides feedback analysis of a mobile phone 
(smart phone) reviews dividing them into positive class, neutral class, and negative class. The 
result of this analysis has been visualized by a bar chart, word cloud, and numerical value, 
which can be used by a customer in decision-making before purchasing a new smart phone. 
The sellers can also use this system to expand their business. 
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INTRODUCTION 


L. 21st Century due to technological revolution, use of 


digital marketing/ e-commerce increased exponentially. 
Huge numbers of products are available in the e-market 
sites. It is very difficult for a customer to select a product 
from an embrace number of similar product. The online 
shopping style of customers is greatly affected by the 
development of online marketing sites. Reviews those exist 
on online marketing sites are used by a customer to make a 
right choice before purchasing decisions. The customer who 
has purchased a product can give their opinion on that 
product by giving rating, comments or both rating and 
comments. 

A massive number of online reviews both ratings and 
comments are shown in e-marketing sites. It is very difficult 
for a customer to read these reviews and catch the truthful 
particulars about the product. To overcome these 
challenges and help the new buyers, feedback analysis is 
used. This technique is mainly used to extract the truth full 
information from the review’s insight. This research work 
will help the new customers/buyers for making decision 
before selection of product from e-market sites. 


RELATED WORK 


In the rapid growing and spreading digital marketing most 
of the retailer and manufacturers want to sell their products 
through online marketing platforms. Customers can easily 
purchase their favorite products from these platforms and 
share their views/reviews on websites. To analyze the 
individual’s subjective review, Fuzzy logic can be employed 
for a reliable meaningful insight [4]. The demand of e- 
commerce is rising and product reviews posted by 
customers give significant feedback for making potential 
customers and it influence the new buyers to take decision 
about their purchase. [7] 


Sentiment Analysis is a procedure of determining the 
subjectivity and strength of polarity of specific comments. A 
survey confirms that 81% of e-market customers have done 
their on-line shopping by searching their favorite 
frequently. Manually feedback analysis is challenging to 
make a conclusion. 


Today textual reviews available in the website are 
vigorously increased. There are numerous online marketing 
sites that permit customers to buy as well as post their 
comments for the procured products that results an 
incremental gathering of subjective reviews in natural 
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language (NL). To mine the inclusive sentiment or opinion 
polarity from altogether of them, sentiment analysis can be 
used for making a decision. In practical, manually these 
reviews analysis is impossible. Therefore, this efficient 
approach has an important role to solve these issues. [3] 


A significant concept in fuzzy logic lies in the perception of 
semantic variables. Fuzzy set theory offers a traditional 
approach to model the intrinsic fuzziness between 
sentiment polarity classes. These characteristics make 
Fuzzy logic to identify sentiment classification of product 
reviews. [9] 


Due to the continuous development of digital marketing 
platforms, traditional marketing and feedback system lost 
their importance buy that reason online reviews system 
take an important role in purchasing decisions. There is a 
problem in this system, how to analyze and find out the 
accurate information from these reviews. [1] 


Feedback Analysis or Review mining is a process which 
analysis customer’s feelings, attitudes, emotions and 
reactions towards certain entities. Data for this research 
work is together from Amazon.com. The Amazon reviews 
dataset analyzed accurately using python programming 
that classified the text reviews into positive, neutral or 
negative according to customer’s textual feedback.[5] 


The arena of sentiment analysis and review mining has an 
extensive possibility of research. It helps to invention the 
inclusive polarity of massive volume of dataset in no time 
and the result can be used for advance analysis for 
enhancement, and improvement of that specific product as 
well as the concern business [6]. Python based sentiment 
analysis results in the form of graphs and bar charts for easy 
visualization. [2] 


Being motivated from the existing research the authors 
proposed a system that can be employed to predict, exact 
ratings and comments of a product. Sometimes the rating 
given by the customers does not furnish proper 
justification. By this research work, authors investigate 
sentiment analysis of a smart phone (Galaxy S20 5G) 
reviews from Amazon.in, that is valuable for the customers 
and the manufacturers also. Sentiment analysis analyzes the 
reviews and level it, i.e. ‘better’ and ‘worse’ sentiment as 
positive and negative respectively [8]. 


OBJECTIVE OF THE WORK 


a. Toidentify the customers satisfaction level on a specific 
product. 


b. To increase awareness about the product quality to the 
specific group of customers to enlarged sales. 


c. To give related information from the existing users ona 
specific product for making purchases decision of anew 
buyer. 


THE PROPOSED WORK 


The authors proposed and developed a system to extract 
the real-time product reviews from the e-marketing website 
(amazon.in) of a particular product url. This review also 
analysis and visualized the inside sentiment with the help of 
python based NLTK, Beauty soup and word cloud. The 
confusion matrix also used to find out the accuracy of the 
sentiments and other statistical measures. 


METHODS AND MODELS 


The number of reviews for a product is a dynamic set of 
information to be captured for sentiment analysis. The 
authors have devised the web-scraping program to exact 
the reviews for the selected amazon products. It’s a python 
program using python package Beautiful Soup. It is a 
required python library to be imported for pulling data out 
of HTML & XML files. Parse Tree is created from the page 
source code to extract the product review data in hierarchy. 
Below are pre-requisites to run this review extraction 
program: 


¥ Chrome Driver path based on the System OS & 
Directory path 

¥ Selenium Web Driver to be imported 

Y Beautiful Soup to be imported 


The inputs of this program are the review page links for the 
selected Amazon products. The outputs of the program are 
all the reviews shared by Amazon customers in a .csv file 
format. Amazon stores the heading title, rating, date, review 
text, found helpful flag for each review feedback shared by 
the customers for any product they purchased. The program 
is extracting all these fields in the .csv file. After extracting 
the customer feedback from Amazon, the authors have 
implemented sentiment analysis program using NLTK 
utilities available in Python. The input for this program is 
the extracted Amazon product reviews in .csv format. They 
have implemented the output visualization by bar chart and 
word cloud. Authors have used the confusion matrix to 
classify (polarize in to positive, negative and neutral words) 
the reviews and also measure the accuracy of the reviews. 


RESULTS AND DISCUSSION 


The developed system is working well, with the help of this 
programme we able to extract the e-marketing web site 
(amazon.in) product reviews. Here we insert Samsung 
Galaxy S20 5G smart phone link 
(www.amazon.com/Samsung-Unlocked-Fingerprint- 
Recognition-Long- 

Lasting/dp/BO82XXKZRC/ref=cm_cr_arp d product top? 
ie=UTF8' driver.get(p_url) ) in the input url section of the 
developed progamme on 5 May 2021, and run the 
programme. The output is automatically saved in name 
Review_All_Amazon_Product.csv' shown in Fig 1: From the 
Fig. 1: It is visible the reviewer’s name, date of review given 
rating on the product, comment on this product and also it 
is shown that how many people found this is helpful for 
their decision making. 
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be 1 
B Cc D E F G 
Title Rating Date Text Found_Helpful 
Akash Por 1.0 out of Reviewed Camera qt 308 people found this helpful 
Murugesh4.0 out of Reviewed The phone125 people found this helpful 
YENNI SA13.0 out of Reviewed Not upto 1112 people found this helpful 
JDas 1.0 out of Reviewed The phon¢87 people found this helpful 
Amazon C 3.0 out of Reviewed ProsSmoa 67 people found this helpful 
7 \karthik 4.0 outof Reviewed Killing loo 60 people found this helpful 
Aditya Pri 4.0 out of Reviewed Just 5G is 156 people found this helpful 
Soumyadi 5.0 out of Reviewed The camei55 people found this helpful 
rahul 4.0 out of Reviewed Buying a £46 people found this helpful 
Mr. Hashn 5.0 out of Reviewed I'm in love 44 people found this helpful 
Amazon C3.0 out of Reviewed Greetings 39 people found this helpful 
Enthu 4.0 out of Reviewed Review bz 33 people found this helpful 
Mahesh _ 5.0 out of Reviewed Part 1 Rev 29 people found this helpful 
Shailendri 5.0 out of Reviewed Switched 29 people found this helpful 
sujith kun 2.0 out of Reviewed Camera qi 30 people found this helpful 
PRASKUT14.0 out of Reviewed Excellent 26 people found this helpful 
Amardeey 2.0 out of Reviewed This is rev 23 people found this helpful 
saurabh 15.0 out of Reviewed In love wi 25 people found this helpful 
Amazon C5.0 out of Reviewed Superb ph 25 people found this helpful 
mukesh lz 2.0 out of Reviewed Heating pi 24 people found this helpful 
Dharma 1.0 out of Reviewed Worst bat 22 people found this helpful 
Ashish Shi 1.0 out of Reviewed !'m writin, 19 people found this helpful 
Ashish Bh 3.0 out of Reviewed! have bee 19 people found this helpful 
Ashish Mc5.0 out of Reviewed "Review a17 people found this helpful 


Review_All_ Amazon_Product 09052 /@35 





Text Rating 


Camera quality is very poor, 108 MP camera @ i... 


The phone is really great, | had purchased it ... 
Not upto the mark, compared to pro max only pr... 
3 The phone is awesome. Good camera and good dis... 


4 ProsSmooth displaySoundAmoled displaySome spec... 





Fig 4: Text review and correspondence rating 


Out[9]: <AxesSubplot:xlabel='Rating', ylabel='count'> 








Fig 1: Website’s extracted reviews in .csv format 


In the second part of the developed programme it is need to 


mm 
insert the .csv file in the input section and it make the 
sentiment analysis for the extracted product reviews in the 100 
first section. In Fig 2: it is shown that the .csv file converted 
into a table. j 
1 2 3 4 5 


Rating 





Title — Rating Date Text Found_Helpful 





1.0 out Reviewed in Indiaon Camera quality is very poor, 108 308 people 
oSsers 2 March 04 MP camera i... found his helpful Fig 5: Representation of rating and count 


Murugeshan  4.0out Reviewed in Indiaon — The phone is really great, | had 125 people . ; ; . 
Thevar of 5 stars 22 March 2021 purchased it... found this helpful In Fig 3: shows the review process, which is clearly shown 


YENNI 3.0 out Reviewed in Indiaon Not upto the mark, compared to 112 people that how the stop words and be verb, adverb, adjectives are 
a pro max only pr... found ths help removed from the text review sentences. It is also shown 


JDas _, LJout Reviewed in indiaon The phone is awesome. Good 87 people found the text rating and polarity rating of each review. 
of 5 stars 22 March 2021 camera and good dis... this helpful 


Akash Porwal 


ProsSmooth In Fig 4: the reviews are more clearly shown and the 
displaySoundAmoled 67 people found . . . 
dsplaySome spec, Shep numerical reviews make correspondent with the text 
review. 
In Fig 5: show a bar chart of the total review. This chart 
focused on the numerical rating ie., X axis represent the 
rating (1 star, 2stars, 3stars, 4stars and 5stars) and the Y 


axis represent the count of the reviews. 


3.0 out Reviewed in India on 
of 5 stars 22 March 2021 





4 Amazon Customer 





Fig 2: Website’s extracted reviews in a table 


Text Rating Polarity_Rating Review — Review_Processed 





Camera quality poor + Camera quality poor 
Negative 108 MP camera @ 108 MP camera @ 
working 6 work 6 MP 


0 Camera quality is very poor, Out [35]: 
108 MP camera @ i. 


Polarity Rating Review_Processed score compound 





phone really great phone really great 
Positive purchased first flash purchase first flash 
sale sale | 


Camera quality poor 108 MP camera @ — {1neg': 0.383, ‘neu’: 0.617, ‘pos 2st 


The phone is really great, | work 6 MP .. 0.0, ‘comp... 


1 had purchased it 


Phone really great purchase first flash — {neg’: 0.084, ‘neu’: 0.648, ‘pos’: oT78 

upto mark compared upto mark compare pro sale |. 0.269, 'co.. 

pro max pro better buy max pro good buy 
2000 20000 v. 


2 Not upto the mark, compared 
to pro max only pr 


upto mark compare pro max pro good {neg’ 0.0, ‘neu’: 0.67, ‘pos: 9153 


phone awesome Good phone awesome Good buy 20000 v.. 0.33, ‘compou.. 


camera good display camera good display 
use fea. use fea... 


The phone is awesome. 


3 Good camera and good dis Negative 


phone awesome Good camera good — {neg': 0.158, ‘neu’: 0.71, ‘pos: 03074 
display use fea.. 0.133, ‘com. : 


ProsSmooth ProsSmooth ProsSmooth 
displaySoundAmoled displaySoundAmoled —_displaySoundAmoled 
displaySome spec. displaySome spec. displaySome spec 





ProsSmooth displaySoundAmoled {neg': 0.0, ‘neu’: 0.734, ‘pos: 


displaySome spec. 0.266, ‘comp ye 





Fig 3: Review process Fig 6: Polarity rating of text reviews 
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Amazon_Product_sentiment_analysis_prog - Jupyter Notebook 


Text Rating Polarity_Rating Review Review_Processed score compoun 





{neg’ 
0.0 


Nice camera Nice camera 

nice looking nice looking 

1149 and big Positive big battery 
battery is os. 


Nice camera nice ‘neu’ 
look big battery osm (0.533, 
osm price price ra. Roe 


‘comp. 


{neg': 
Nice camera A Q, 
Nice camera nice ‘neu’: 


nice looking 
look Ps 
big battery [00K bia bes ae 
osm price. : 0 467. 


‘comp. 


Nice camera 

nice looking 

1160 and big 
battery is os. 


Positive 


{neg’ 
Phone is Phone really 0.0, 
Teally good good respect Phone really good ‘neu! 
1151 with respect Positive amount respect amount get 0.664, 
to the getting use week . ‘pos’: 
amoun.. using... 0.336, 
‘comp. 


991 rows x 8 columns 


Fig 7: Compound polarity score 


In Fig 6: it is more clearly shown the total review process. 
First it shows the polarity rating (negative or positive) of 
the review then it analysis sentence wise and the part of the 
negative, positive and neutral sentiment of each review 
sentence. At last column it represents the total sentiment 
score of the review. From the Fig 7: it is observed that the 
compound sentiment and the polarity rating give the same 
information on each customer’s review. If both the 
information (rating and text) not match then we may 
consider that the customer does not give accurate review on 
the said product. 


plot_peridcioud(rewiew_af , “rositive 


performance device 


oS bk Y - Dpnons 
screen 
mobile . 

Oneeerein, ph one 
-e Oo quality 
camera 

awesome 
= ZOoc phone 
battery 
Pts Pr Oo 


lgok ©: 

feel 
best 
=e 6S great 


value mone weil 





Fig 8: Word cloud for positive reviews 





eheot_moridcboudtireview_at 


2 work ay ay ime 
= touch Redmi One 

use Zood. 
display = cal bs 


mer 


ecamera qualityRedmi 


issue bac 


Fig 9: Word cloud for negative reviews 














In [40]: accuracy _score(rev_df m[ ‘Polarity Rating'] , rev_df_m['comp_rating'] ) 


Out [40]: 0.7951564076690212 


In [41]: print(classification_report(rev_df_m[‘Polarity Rating'] , rev_df_m[‘comp rating’ 


precision recall fl-score support 


Negative 0.83 0.46 0.59 322 
Positive 0.79 0.96 0.86 669 


accuracy 0.80 991 
macro avg 8 i 0.73 991 
weighted avg : : 0.78 991 


: print(confusion_matrix(rev_df_m['Polarity Rating'] , rev_df_m['comp_rating’] )) 


[[149 173] 
[ 30 639]] 


Fig 10: Accuracy and other statistical measure 


In the Fig 8: and Fig 9: shows the positive word cloud and 
negative word cloud. It is the visible result of the developed 
programme. In this word clouds the bold and big font words 
said the information gathered more strongly and small and 
normal font words comparatively gathered week 
information. These word clouds draw automatically, 
depending on the frequency of the words. Anyone can 
understand the exact on-site sentiment of the product 
review. In Fig 8: the positive word cloud said about the 
mention phone is good according to its battery, 
performance, display etc. and the Fig 9: the negative word 
cloud said that the mention phone is good but the camera of 
the mention phone is bad. If we analyze the FIGURE VIII and 
Fig 9: correspondingly then it is too clear that customers are 
compare the Samsung smart phone with Redmi smart 
phone. So, the manufacturers are needed to focus on camera 
and other features of Redmi. In Fig 10: authors show the 
accuracy of the sentiment analysis and other statistical 
measures. The confusion matrix shows the accuracy 0.795 
for this particular product sentiment analysis. It also shows 
the precision, recall, fl-score and support in column on 
correspondence in rows, negative, positive, accuracy, macro 
avg. and weighted avg. 


Review_All_Amazon_Product.csv' shown in Fig 1: From the 
Fig. 1: It is visible the reviewer’s name, date of review given 
rating on the product, comment on this product and also it 
is shown that how many people found this is helpful for 
their decision making. 


CONCLUSION 


Customers’ feedback analysis in e-business is a web-data 
mining which analyze their sentiments, attitudes, and 
emotions towards a particular product or service. The data 
of online product reviews for this research are collected 
from Amazon.com for a particular product over a 
predefined time interval. Uniqueness of this work is to find 
the overall polarity of enormous dataset in a few seconds 
with the help of a newly developed automated system. 


In this research work Python based NLTK has been used for 
sentiment analysis of the comments. The newly developed 
application has been used to gain insights into user’s 
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reaction to a specific product, based on the number of 
his/her ratings and comments. The application further 
provides focused semantic information such as positive 
feedback class, neutral feedback class and negative 
feedback class with reference customer’s feedback. Results 
have been presented in form of graphs, bar charts and word 
cloud for easy visualization and understandable for a 
common people. 


This analytical research work is helpful to depict the insight 
of, actual rating of a product, when customer’s text review 
contradicts with numerical ratings. This insight of our 
experiment is beneficial for customers, Retailers, 
Competitors, and Manufacturers as well. 


In a nutshell, this work is generic and can be used for any 
given product on a predefined data. In particular this 
experiment is done for the product model Samsung Galaxy 
S20 5G with reference to the review collected on 09.05.2021 
using our newly developed python based NLTK augmented 
system software. The result suggests that customers 
recommend this product to future buyer with reference to 
value for money. However, the results suggest that 
manufacturer to improve the quality of charger for better 
market capitalization. 


This empirical study is just a footstep in this research 
domain. Future researchers can explore the validity of our 
findings for a different product or with respect to a different 
e-business portal other than Amazon. The authors of this 
study expect the future researchers to explore the impact 
other types of feedback and the use of other techniques 
which have not been used in this paper due to paucity of 
time & other resource limitations. 
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